10695672 


Ttcbiicfl  ItsMrck  Ntta  208 


EXPERIMENTAL  COMPARISON  OF  MONTE-CARLO 
SAMPLING  TECHNIQUES  TO  EVALUAn 
THE  MULTIVARIATE  NORMAL  INTEGRAL 


Elizabeth  N.  Abbe 


STATISTICAL  RESEARCH  AND  ANALYSIS  DIVISION 


I'':  OCT  21 196'’  ! 


U.  S.  Army 

Behavioral  Science  Research  Laboratory 

June  1969 


This  documenl  h«i  been  approved  for  public  ralaaaa  and  aala.  ita  diitnbution  la  unlimited. 

Reproduced  by  the 

CLEARINGHOUSE 
for  Federal  Scientific  &  Technical 
Information  Springfield  Va.  22)51 


Ttchilnl  likMitb  Nkta  201 


AD 


EXPERIMENTAL  COMPARISON  OF  MONTE-CARLO 
SAMPLING  TECHNIQUES  TO  EVALUATE 
THE  MULTIVARIATE  NORMAL  INTEGRAL 


Elizabeth  N.  Abbe 
Richard  C.  Sorenson,  Task  Leader 


STATISTICAL  RESEARCH  AND  ANALYSIS  DIVISION 
Cecil  D.  Johnson,  Chief 


U.  S.  ARMY  BEHAVIORAL  SCIENCE  RESEARCH  LABORATORY 

Office.  Chief  of  Research  and  Development 
Department  of  the  Army 

Room  239,  The  Commonwealth  Building 
1320  Wilson  Boulevard,  Arlington,  Virginia  22209 


June  1969 


Army  Project  Number 
200621 06 A723 


Optimization  Models  b-12 


This  document  has  been  approved  for  public  release  and  sale;  its  distribution  is  unlimited. 


FOREWORD 


BESRL's  OPTIMIZATION  MODELS  Work  Unit  seeks  to  provide  means  of  solution  to 
personnel  management  problems  relating  to  the  distribution,  training,  career  progression, 
reassignment,  and  utilization  of  personnel  in  current  and  future  Army  personnel  subsys¬ 
tems.  Personnel  systems  are  analyzed  and  areas  identified  for  which  objective  optimiza¬ 
tion  techniques  can  profitably  be  applied. 

Quantitative  models  and  computing  algorithms  for  optimal  assignment  and  for  evaluat¬ 
ing  the  feasibility  of  alternative  approaches  to  personnel  system  problems  have  been  pro¬ 
vided.  Optimization  techniques  are  developed  to  identify  the  "best"  policy  for  meeting 
future  military  contingencies  at  minimum  cost  in  terms  of  personnel  system  esources.  The 
present  Technical  Research  Note  compares  techniques  for  estimating  manpower  require¬ 
ments  where  a  number  of  individually  varying  skills,  performance  potentials,  background 
and  behavioral  factors  must  be  considered. 

The  entire  research  work  unit  is  responsive  to  requirements  by  RDT&E  Project 
2Q062106A723,  "Human  Performance  in  Military  Systems,"  FY  1969  Work  Program. 


EXPERIMENTAL  COMPARISON  OF  MONTE-CARLO  SAMPLING  TECHNIQUES 
TO  EVALUATE  THE  MULTIVARIATE  NORMAL  INTEGRAL 


BRIEF 


Requirement; 

To  represent  more  realistically  and  precisely  the  complex  and  multiple  interrelation¬ 
ships  within  the  Army  personnel  system,  continued  development  of  quantitative  and  tech¬ 
nical  methodology  is  required.  The  present  objective  was  to  evaluate  two  different  numeri¬ 
cal  methods  for  estimating  probability  when  a  multivariate  normal  model  (for  example,  one 
involving  scores  on  a  battery  of  tests)  can  be  assumed. 


Procedure; 

In  a  series  of  simulation  experiments  in  which  random  vector  observations  were 
generated,  probability  estimates  were  computed  by  each  of  the  methods.  Probability  rtiHions 
on  which  the  experiments  were  based  were  chosen  to  have  a  variety  of  properties.  The 
precision  of  the  two  methods  was  compared  from  the  magnitudes  of  the  variances  of  the 
probability  estimates  over  independent  samples. 


Findings; 

Results  appear  to  be  affected  both  by  the  size  of  the  probability  region  being  estimated 
and  by  the  goodness  of  the  approximation  of  the  sampling  distribution  to  the  unknown 
distribution.  The  more  complex  method  was  consistently  superior  for  very  small  probability 
regions:  but  when  the  sampling  approximation  was  poor,  the  precision  of  the  probability 
estimates  favored  the  simpler  approach. 


Utilization  of  Findings; 

The  computational  procedures  developed  in  conjunction  with  this  series  of  experiments 
are  considered  practical  methods  of  estimating  probability  based  on  multiple  scoria  for 
individuals  in  a  sample  population.  Estimation  problems  for  which  one  method  can  be 
expected  to  be  superior  to  the  other  were  clarified.  Changes  in  the  computational  proce¬ 
dures  from  which  methodological  improvements  may  be  expected  were  also  made  apparent. 
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EXPERIMENTAL  COMPARISON  OF  MONTE-CARLO  SAMPLING  TECHNIQUES 
TO  EVALUATE  THE  MULTIVARIATE  NORMAL  INTEGRAL 


Personnel  management  research  relating  to  manpower  has  led  to  the 
development  of  both  optimization  and  simulation  models  within  BESRL. 

These  models  can  relate  to  the  formulation  of  policies  regarding  man¬ 
power  requirements  and  are  effective  in  the  evaluation  of  such  policies. 
Realistic  estimates  require^  however^  that  multiple  behavioral  response 
patterns,  as  well  as  the  complex  Interrelationships  among  these  patterns, 
be  Incorporated  into  the  model  of  the  personnel  system.  The  continued 
development  of  more  advanced  quantitative  techniques  is  therefore 
required. 

The  research  reported  is  directed  toward  the  development  of  one 
such  quantitative  technique  for  use  with  models  for  which  the  multiple- 
response  patterns  can  be  characterized  by  a  particular,  but  very  commonly 
encountered  distribution,  the  multivariate  normal  distribution.  One 
example  of  the  application  of  this  technique  is  in  policy  formulation, 
where  it  may  be  necessary  to  estimate  the  proportionate  increase  of  men 
assigned  in  various  Army  Job  areas  which  would  occur  if  requirements  on 
certain  classification  criteria  were  made  more  lenient.  A  suggested 
policy  of  lowering  requirements  five  points  on  each  of  the  job  areas 
under  investigation  might  show,  for  instance,  that  too  few  men  would  be 
classified  into  these  areas  to  meet  increased  manpower  needs.  As  another 
example,  some  estimate  of  the  proportion  of  the  total  Army  population 
which  lies  between  certain  bounds  on  a  system  of  behavioral  measures 
might  be  required  before  performing  a  valid  simulation  experiment.  Such 
problems  require  the  Integration  of  the  multivariate  probability  dib  • 
trlbution  characterizing  the  population,  which  in  this  case  is  assumed 
to  be  the  multivariate  normal  distribution. 

A  general  analytical  method  for  integrating  the  multivariate  normal 
distribution  is  at  the  present  time  unavailable.  The  Integral  can  be 
approximated  by  numerical  quadrature,  but  a  very  large  number  of  data 
points  needs  to  be  generated  when  the  number  of  dimensions  is  large. 
Another  approach,  designed  to  improve  the  approximation  to  the  integral 
and  shorten  computation  time,  is  to  obtain  the  observations  by  random 
sampling  over  the  region  of  Integration  rather  than  use  the  systematic 
sampling  of  quadrature.  Such  a  'Yionte-Carlo"  approach  will  yield  prob¬ 
ability  estimates  which  vary  from  sample  to  sample.  The  larger  the 
variance,  however,  the  larger  is  the  number  of  points  which  must  be 
generated  to  obtain  a  given  degree  of  precision,  and  again  the  time 
required  for  computations  may  be  appreciable.  Assuming,  then,  that  a 
given  Monte-Carlo  method  yields  unbiased  estimates,  the  precision  of  the 
method  can  be  evaluated  from  the  magnitude  of  the  variance  of  the  esti¬ 
mates  over  independent  samples. 

Experimental  results  have  been  obtained  to  evaluate  two  different 
methods  of  Monte-Carlo  sampling  to  integrate  the  multivariate  normal  prob¬ 
ability  density  function.  Results  for  different  types  of  probability 
integrals  and  a  description  of  the  sampling  techniques  are  presented. 


MONTE-CARLO  METHODS 


Tallied  Sampling  (Monte-Carlo  Method  I) 

One  of  the  Monte-Carlo  methoda  deacrlbed  has  already  been  uaed  and 
reported  by  Hllller  (l).  Random  vector  obaervationa  are  generated  to 
have  the  multivariate  normal  diatributlon  of  Intereat.  The  proportion 
of  observations  which  fall  within  the  specified  region  of  integration 
constitutes  the  probability  eatimate.  Even  though  the  computations 
Involved  are  simple,  this  type  of  eatimate  shows  considerable  change 
from  sample  to  sample.  Such  variability  serves,  however,  as  a  useful 
standard  by  which  to  compare  the  behavior  of  a  second  method  of  Monte- 
Carlo  sampling,  more  complex  computationally,  but  which  has  showed 
promise  of  yielding  estimates  with  a  higher  degree  of  precision. 

The  Model  Is  described  below: 


Since  any  multivariate  normal  distribution  can  be  easily  converted  to 
standard  form,  methods  of  Integration  can  be  described  In  more  general  terms 
with  reference  to  the  standardized  distribution.  Let  X  - V 

represent  a  k-dlmenslonal  vector  of  random  variables  distributed  as  the 
multivariate  normal  with  mean  vector  0  and  correlation  matrix  R.  The 
multivariate  normal  Integral  over  a  given  region  A  ■ 

{k,  Ml);  (fa,  Mb),  ...,  (\,  Uj^)  is 

TT  -  f(X)dX 

•  ^^k/2  r'*'  •••  expc-^cxR*'  x*)! 

(2  -ii  Jfa 

To  estimate  TT,  introduce  a  new  random  variable  y,  where 
1/  -  1  If  i:  c  A 

(1) 

y  ■  0  otherwise. 

In  scalar  notation,  e  A  if  jfcj  <  <  Uj  for  J  ■  1,  2,  ...,  k. 


Then 


'  j!l  f!.  -  /J!  ^  •••>  V  ^  ••• 

li*  ^  ^  ^ 

+  j  *  •  •  f*  y  ^  (^1  ,  ^  ,  •  •  • ,  X.  )  dxa  •  • «  dx 

\  \  Ujj  ^ 


-  2  - 


»v  r 


o 

The  estlmetc  of  TT  is  the  random  variable  p  m.  -  ■ ,  where  •••> 

y  represent  a  sample  of  independent  random  variables  defined  as  In  (1)  from 

a 

n  vector  observations  X  .X  ,  ^  randomly  sampled  from  f(X). 

I  a  n 


If  yi  )  ya  *  ’**'  *  given  set  of  observations, 


n 


The  variance  of  the  binomial  variate  y  is  0*(y)  ■  H-LL-l-ILi  . 

.  -P  (1  -P) 

Its  estimate  is  ?iy)  ■  ■ 

or,  if  written  in  terms  of  the  observations,  ^  (y)  ■  . 


To  obtain  the  observed  values  yi  ,  ya ,  y^,  first,  n  vectors 

of  independent  random  normal  deviates  are  generated;  Z  has  mean  vector 
DV.  ■  0  end  the  Identity  correlation  matrix  I  >  R  ■  Var(Z'Z)  when  n  -*  <•. 
Then  X.  ■  Z.  ft*  for  i  -  1,  2,  n,  where  ft^  is  the  .quare  root  matrix 

^  ^  i  i  i 

of  ft.  Furthermore,  ■  0  and  Var  (X)  ■  ft*Z'ZSl*  ■  ft®Ift®  ■  ft  when  n  -•  «, 
as  is  required  for  the  parameters  which  characterize  f(X).  If,  for  each 
element  x^^  of  any  vector  X^,  fj  <  x^^j  ^  then  y^  is  set  to  1;  but  if 
any  element  of  X^  falls  outside  the  specified  Interval,  y^  1*  set  to  0. 


Importance  Sampling 

An  alternate  Monte-Carlo  approach  to  integration  is  referred  to  as 
importance  sampling.  Random  numbers  upon  which  the  estimates  of  the 
unknown  parameter  are  based  are  generated  from  a  distribution  other  than 
the  one  suggested  by  the  problem.  Each  value  of  this  "biased"  sample  la 
multiplied  by  an  appropriate  weighting  factor  which  corrects  for  having 


used  the  wrong  distribution.  The  probablliiy  that  a  sample  will  be  drawn 
from  an  "interesting"  or  "Important"  region  is  Increased  as  a  result  of 
the  biasing;  the  probability  that  It  will  come  from  an  "uninteresting" 
or  "unimportant"  region  is  decreased--a  desirable  result.  By  basing  more 
computations  on  random  numbers  generated  from  portions  of  the  probability 
region  with  which  the  manpower  problem  is  concerned,  it  is  possible  to 
Improve  the  accuracy--that  is,  reduce  the  variance--of  the  estimate. 


The  model  Is  developed  as  follows: 

Let  g(X)  “  g(J4,  ,  •••,  X.)  represent  a  k-varlate  uniform 

probability  density  function  defined  over  the  probability  region 
A  -  [(U  ,  ui  ),  (la,  ifc),  (Ij^,  Uj^)].  That  is. 


g(X) 

and 


i: 


0  If  X  c  A 
0  othexvlse 


I  fh  fa 

J^g(X)dX  -  ...  g(x^,  x^,  ...,  Xj^) 


(1) 

Let  h  represent  another  function  of  the  random  variable  X.  The  expected 
value  of  h  Is  defined  as 


"  .f .f-M  ***  ^  ~  •••»  g(^ »  Xja ,  •••»  ••• 

In  {Particular,  suppose 

h(X)  - 

with  f(X)  representing  the  multivariate  normal  probability  density  function 
whose  Integral  over  A  Is  required.  Then 


which  Is  the  unknown  probability  region  of  Interest. 
An  estimate  of  TT  Is  the  random  variable 


P 


1  ;  tW 
"  eW 


where  X  ,  X  , 
i'  a* 


X  are  random  vectors 


sampled  from  g(X). 


(2) 


(3) 


4 


Let  J/  ,  1/  ,  V  where 

1  a  “ 

of  Independent  random  variables 
variable  p  Is 


for  any  1,  represent  a 


‘'i  b(\) 

over  A.  Then  the  variance  of  the 


sample 

random 


-  all  “  ^  ) 

®  \i-  iii 


I 


f(A) 


The  variance  of  j/  is  o®  (y)  =  E(^*)-E^  (y) 


.  "r  CSl 

‘  Vo 


-i.  [y]- 

—  g(x)dxl  -  n* . 

(X) 


(4) 


An  estimate  of  E 

where  X  X  ,  ....  X 
1  a  a 

uniformly  distributed 


1  n 

(y)  is-  L  -j 

"  1-1  r 

represent  the  same  sample  of  random  variables 
over  A  and  used  to  estimate  TT. 


Standard  Importance  Sampling  Approach  to  Integration 

Different  Monte-Carlo  methods  for  estimating  TT  result  from  alternate 
choices  for  the  sampling  distribution  g(X).  In  one  of  the  simplest  and 
most  frequently  used  methods  of  Monte-Carlo  sampling,  sampling  Is  based 
on  the  uniform  distribution.  Random  vectors  with  limits  on  the  elements 
defined  by  the  probability  region  of  Interest  are  generated.  These 
observations  are  then  used  to  compute  the  ordinate  of  the  given  multi¬ 
variate  normal  distribution,  and  a  weighted  mean  of  the  ordinate  values 
provides  the  estimate  of  TT.  Since  random  uniform  numbers  within  specified 
limits  can  bc>  easily  generated,  no  computational  time  is  wasted  through 
rejection  of  any  sample  values.  In  more  detail,  let 


g(X)  -  g(X  ,  X  ., 
1  2 


i'  - 1.) 
n  J  J 
|j-i 

lo  otherwise 


[a: 


If  <  Xj  <  Uj 


This  Is  the  k-variate  uniform  probability  density  function  over  the 
unknown  probability  region  A  -  [li,  U|.  ) ,  (la,  Ua  )  >  •••,  with 

a  k-dlmenslonal  volumn,  the  cartesian  product  of  one  dimensional  Intervals 
(Ij,  J  “  2,  ...,  k,  being  denoted  by  [A].  As  is  required, 

g(X)  >  0  for  all  X  and 

^  L 


L 


-  5  - 


[A] 


Written  in  the  form  of  this  distribution,  the  estimate  of  IT  is 

P  ■  —  E  the  variance  is  o*  (v)  *-[CA]  J  f^(X)dxl  -  if , 

"  i«l  *  *■  “A 

where  *re  random  vectors  sampled  from  g(X). 

Observations  •••/  ^kll  which  have  the  g(X)  distribu 


tion  are  constructed  by  first  generating  n  vectors 

of  independent  random  numbers  which  are  uniform  over  the  interval  (O,  1). 
Then  -  (Zj^)  (a^,  ^  +  i  "  ® j »  i  )  ® 1  J-  The 

observed  probability  written  in  terms  of  these  observations  is 


l«fl 


i 


k 

n  (1. 


uj 


(2  >1  J  ^ 


exp  [.  ^  Xi')] 


where  is  the  correlation  matrix  for  the  given  multivariate  normal  dis¬ 
tribution  and  |R^|  is  the  determinant  of  R. 

Evaluating  the  relative  precision  of  different  procedures  for  esti¬ 
mating  TT  is,  of  course,  equivalent  to  examining  the  size  of  the  varia  ices 
of  the  different  estimates.  One  way  to  reduce  the  variability  of  the 
estimate  for  a  given  method  of  Monte-Carlo  sampling  is  to  adopt  some 
appropriately  designed  procedure  for  stratified  sampling.  The  probability 
region  of  Interest  A  Is  divided  Into  mutually  exclusive  subregions 
A  ,  A  ,  Ax,  an  Independent  probability  estimate  p  obtained  for  each 

A^,  and  the  sum  of  these  estimates  used  to  estimate  the  probability  over 

the  entire  region.  If  the  number  of  points  sampled  within  each  area  is 
proportionate  to  the  size  of  the  region  over  which  sampling  is  defined, 
then  it  can  be  shown  that  the  variance  of  the  estimate  of  TT  for  the 
stratified  sample  will  be  less  than  or  equal  to,  but  never  greater  than, 
the  variance  for  the  estimate  obtained  without  stratification. 

There  are,  of  course,  alternative  approaches  to  the  way  In  which  the 
total  area  A  is  divided  into  strata.  One  procedure,  particularly  appro¬ 
priate  when  sampling  proceeds  over  uniform  dimensions.  Is  "A  Modified 
Monte-Carlo  Quadrature"  presented  by  Haber  (2).  The  area  for  each  stratum 
is  the  accumulated  product  of  one-dimensional  uniform  univariate  distribu¬ 
tions  over  k  dimensions;  i.e., 

k 

A_  »  n  P  for  Pj  *  0,  1,  •••f  ”  1  a.nd  s  *1,  2,  >«•,  t. 

J  J 


The  intervals  within  each  of  the  k  dimensions  are  construct  ad  so  that  the 

,  are  commensurable;  in  the  simplest  case, 

J 


6 


all  of  them  may  be  multiples  of  the  smallest  one.  Consequently,  the 
k-dimensional  volumns  ,,,,  will  be  comnensurable,  and  the 

variance  of  the  estimate  based  on  the  sum  ot  estimates  over  t  strata 
will  be  less  than  or  equal  to  the  variance  of  an  estimate  based  on  an 
undivided  sample;  i.e., 

t 

Var  (  Z  p  )  ^  Var  (p)  . 

8-1  • 

Preliminary  experimental  results,  including  results  with  the 
advantage  of  sample  stratification,  indicated  that  the  uniform  distribu¬ 
tion  was  not  a  suitable  sampling  distribution  for  Integrating  the  multi¬ 
variate  normal  distribution.  The  choice  of  an  appropriate  design  for 
importance  sampling  is,  of  course,  not  routine,  and  the  variance  of  the 
sample  estimates  may  be  increased,  even  infinitely,  if  the  random  data 
points  are  generated  from  the  wrong  distribution.  It  did  appear  more 
profitable  to  examine  in  detail  experimental  results  based  on  a  differ¬ 
ent  Importance  sampling  design  which  showed  more  promise  of  having  the 
desired  property  of  reduced  sample  variance. 


Importance  Sampling:  Boldt  Method  (Method  II) 

An  original  approach  to  importance  sampling  was  devised  by  Boldt  (Ji)- 
Boldt  suggested  that  a  multivariate  normal  distribution  with  a  single 
coninon  factor  covariance  structure  often  is  a  good  approximation  to  covari¬ 
ance  structure  observed  in  practice  and  might  serve  as  a  suitable  solution 
for  the  integral  of  the  sampling  distribution  over  the  same  limits  required 
for  the  unknown  Integral.  A  simple  quadrature  solution  does  exist  for 
variance-covariance  matrices  with  the  coimon  factor  structure  and  has  been 
described  a  number  of  times  in  the  literature,  for  example,  by  Curnow  and 
Dunnett  (4)  and  by  Lord  (5).  Furthermore,  the  generation  of  random  obser¬ 
vations  which  have  one  common  factor  is  also  quite  simple.  To  force  the 
points  within  the  specified  limits  of  integration  complicates  the  compu¬ 
tational  procedure  somewhat,  but  does  not  result  in  a  significant  in¬ 
crease  in  computer  time. 


Development  of  the  model  follows: 


Identical  with  formula  (2),  the  integral  over  A  for  the  multivariate 
normal  distribution  is 


m. 

i(xj 


where 


P  1  " 

J^g(X)dX  -  1,  and  the  estimate  of  TT  (formula  3)  is  j?  ■  “  T. 


i-1 

where  X  ,  X  .  ,,,,  X  are  random  vectors  sampled  from  g(X).  Rather  than 

l  a  a 

representing  g(X)  by  the  uniform  distribution,  however,  sampling  is  from 
a  rank-one  multivariate  normal  distribution. 


-  7  - 


Retaining  the  convention  that  integration  will  be  performed  with  re- 
apect  to  standardized  variates,  let  R  represent  a  correlation  matrix  with 

the  structure  r^^j  ■  qTj  (i  i*  j),  where  -1  ^  or^  ^  +  1  for  i  ■  1,  2,  k. 

Then 

l<g  l*gl  "*  “P  [-4(x  Sg*  X')]  •  llg  *»(X)  .  1-  *»(X)  If  X  t  A  : 

0  otherwise 


The  cuDNilative  density  function  of  g(X)  can  be  expressed  as  a  single  integral 
having  a  product  of  univariate  normal  cumulative  density  functions  in  the 

integrand:  consequently,  p  is  easy  to  evaluate  nttmerically .  In  more  detail, 

R 

txp  -  R?)X']  f(x)<jx  and 


p  •  Pgl*,|i  iJi  “P  [-4 •  *?*)/•] 

llij* 

where  X  .  JT.,  ....  represent  the  random  sample  of  vectors  from  the 
*  • '  '  n 

distribution  g(X)  defined  over  A.  Based  on  (4),  an  estisMte  of  this 
variance  of  p  is 


where  vectors  sampled  from  g(X)  to 

estimate  TT. 
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Solution  for  a  factofd  lnfay«l«  To  obtain  thn  astiiMta  for  TT  •• 

■tated  in  p  ,  a  solution  for  the  intagral  of  tha  aampling  diatribu- 

S 

tion  g(X)  ovar  A,  the  probability  ragion  of  intareat,  ia  raquirad.  Tha 
sampling  distribution  gfX)  is  chosen  to  have  a  singia  cosmon  factor 
correlation  structure  and,  consequently,  can  be  factored  into  a  product 
of  univariate  integrals. 

In  more  detail,  variates  which  have  a  sHiltlvariatc 

normal  distribution  with  saro  means  and  correlation  coefficients 

tij  ■  OTj  OTj  (i  ^  J),  where  -1  s  cr^  *  1  for  i  ■  I,  2,  k,  are  ganer* 

ated  by  tha  formula  j  "  ( ^  j)^  Jfj  The 

1/  are  k  *f  1  Independent  standard  normal  variates.  The  cumulative  density 
function  of  interest  for  the  k  variates  X  is 

J 

<  hj  ;  all  j) 


3^#  •••»  •**  . 

Since  the  x's  are  mutually  independent,  F|^  can  be  expressed  as  the  product 
of  single  integrals;  l.e., 

\  •  j-m  ^  (hj  -  Ofjy)  /  (1  -  all  j)  f(y)dy 

where  fft)  -  exp  t*)  /  (2  TT)^ 
pt 

and  Fft)  •  f(t)dt 

This  integral  is  very  easy  to  evaluate  numerically  by  use  of  some 
quadrature  procedure  in  which  the  integral  is  replaced  by  a  sum.  In 
the  present  application,  the  infinite  interval  <•*,  >  is  taken  as 

[•^,  -f^l  and  is  divided  into  an  even  number  of  2m  ■  10<^  intervals  with 
end-points  defined  by  y  ,  y, ,  ...,  y^.  Then,  using  the  sum  given  by 
Simpson's  rule, 

h  ■ 

*  '•[ffy,)  ♦  r(ri)  *  ...  * 

*  t(rt)  *  ...  *  fCy  )]) 
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»h.r.  f(xj)  .  «p  ^  -i 

.nd  .,  . 


for  i  *0,  1 ,  2,  . . . ,  2in, 

Tha  unlvariat*  standard  cuaailatlva  danslty  functions  ars  evaluatad  by  a 
Hasting's  approximation  (Hastings,  6). 

Sines,  in  most  axamples,  it  is  not  the  cumulativa  density  function 
which  oust  ba  avaluatad,  but  tha  dansity  function  integrated  between  any 
limits  and  u^,  J  ■  1,  2,  k,  in  tha  infinite  range.  Therefore, 

computation  of  p^  is  facilitated  by  tha  formula 
Prob  [1  <jr  <U,X  «JP  sjf.  i  U,,] 

IXlSSd  ItKK 


-  ,  no.  of  lower  bounds  ^  ^ 

■  (-0  Prob  [Of  ‘  >  X,  ^  Ag  »  •••*  ^  A|jJ» 

all  2^ 

possible 

combinations 

where  Aj  takas  on  tha  value  1^  or  u^. 

Generation  of  Sample  Values.  Generation  of  the  Z  vectors  which  have 
the  rank-one  multivariate  normal  distribution  and  which  also  lie  within 
the  limits  defined  by  A  is  not  a  straightforward  computational  procedure. 
For  tha  k  elements  of  Z  generated  from  the  formula 

•j  ■  ■  "j’'’ 

letting 

Bj  -  (1  -  .'j)*. 

It  is  required  that 

Ij  <  Ej  ^  Uj,  where  1^  and  u^  are  the  lower  and  upper  bounds  defin¬ 
ing  the  j  ■  1,  2,  ...,  k  dimensions  for  a  given  probability  region  A. 

If  jr  X.  XV  random  normal  deviates  sampled  over  the  infinite 

i»ni#  •••  t  ^ 

domain,  tha  probability  that  the  resulting  x  ,  M  ,  ...,  a,,  will  lie  within 

*  a  A 

the  required  region  A  is  TT,  and  p,  an  estimate  of  TT,  is  determined  from 
the  proportion  of  the  sampled  vectors  which  lie  within  the  region  A.  It 
Is  desirable,  however,  to  generate  Z  variates  which  have  the  rank-one 
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dlitrlbution  without  rejecting  any  obeervationa.  If,  for  Inatance,  A  is 

very  email,  the  number  of  observation!  which  are  rejected  will  be  very 

large;  the  associated  amount  of  computation  will  also  be  very  large. 

Alternatively,  Z  values  can  be  constrained  to  lie  within  the  specified 

limits  by  appropriately  scaling  the  range  of  the  variates 

X  ,  X  ,  . . . ,  X,  ,  y.  Since 
1  a  ’  k’ 


TT 


«  Pirob  [{Ij  <  +  ajy  <  ^j]  .  J 

-  Rrob  ^  ^  u.l  -  Of.iy) ;  all  j] 


3. 


-  Prob  [{Ij*  ^  ^  Uj} ;  all  j],  letting  i*^  — ^ 


I,  -  y^y  * 

and  Uj 


"j ' 


the  variables  x  ,  x  ,  x.  can  be  constrained  to  lie  within  the  required 

18  k 

range  by  first  sampling  the  random  variable  y  anywhere  over  the  infinite 
domain  and  then  constructing  limits  for  the  x's  as  a  function  of  y.  This, 
of  course,  is  saying  that 

1  j]  • 

which  is  not  the  required  TT,  nor  does  such  a  method  yield  the  correct 
distribution  of  points  for  the  sampling  probability  density  function  g. 

To  clarify  the  discussion  to  follow,  suppose  y  can  take  on  exactly 
t  possible  value  s,  each  with  probability  TT^.  Then  the  unknown 

t 

TT  -  Z  TT  . 

1  s 
s«l 

To  generate  points  having  the  correct  distribution,  if  n  represents  a 
very  large  number  of  points, 

TT 

n  .  — - -  *  n  points 

t  s 


must  be  generated  for  each  value  of  s.  It  is  convenient  to  represent 
each  TT^  in  terms  of  conditional  probabilities: 


TT 


Prob 


Prob 


11 


The  conditional  probability  term  q  can  be  easily  computed: 

8 

q  »  Prob[{Xj  ^  Ujl  y  =8);  all  j] 

.Prob[til^ 

’^1  **J 

9j  "  Pj 


since  X  X^,  ,  X  are  independent  standard  normal  variates  and  q 
X "  ^  ^  ^  ic  ^ 

is  simply  the  accumulated  product  of  probabilities  associated  with 

bounds  of  normal  deviates  for  Independent  standard  normal  univariate 
distributions.  Furthermore,  assume  that  all 


Prob  [y  ■  g]  “  Prob  [yx  a»]  for  s,  s' 
values  of  y  are  equally  likely.  Than 


1,  2,  t;  i.e.,  that  all 


IT 


•s 


t 

s«l 


t 

2  q 

8«1 


S 


and 


n 


n 


t 

2  q 

8«1 
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Aside  from  purposes  of  explanation,  the  variate  y  is,  of  course, 

continuous.  Rather  than  t  discrete  values  which  y  can  take,  there  are  t 

intervals  or  ranges  of  values,  each  bounded  by  a  lower  and  upper  limit 

V  and  V  , , ,  say .  Then 
s  8+1  ’  ^ 

Prob|^{Xj  ^  ■j  ^  lAjI  Vg  ^  y  i  '''’g+l)l  jJ 


represents  a  range  of  probabilities  corresponding  to  the  range  of  y.  To 
estimate  this  range,  an  approximation  must  be  introduced,  letting  one 
probability  q  ,  say,  represent  the  range  of  possible  values.  If  suffi> 

ciently  small  Intervals  are  constructed  for  y,  the  probabilities  will 
have  a  small  range  for  each  interval,  and  the  mean  of  the  smallest  and 
largest  probabilities  should  yield  a  good  approximation;  alternatively, 
y  could  be  chosen  as  the  normal  deviate  corresponding  to  the  probability 
midpoint  of  each  Interval.  The  latter  alternative  was  chosen  for  the 
experiments  described  here. 
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THE  MONTE-CARLO  EXPERIMENTS 


Procedure 

The  relative  efficiencies  of  the  different  Monte-Carlo  methods  were 
examined  for  a  variety  of  experimental  problems.  Two  sets  of  runs  were 
prepared^  one  with  four  variates  and  the  other  with  eight;  the  experi¬ 
mental  problems  for  the  two  sets  were  designed  to  be  somewhat  comparable. 
The  correlation  matrix  characterizing  the  multivariate  normal  distri¬ 
bution  for  the  "unknown"  integral  was  chosen  to  have  a  single  common 
factor  with  all  correlations  of  the  structure  r^j  ■  Qr^Qfj,  where 

-1  ^  ^  +1  for  i  ■  1,  2,  ...,  k.  Then  the  unknown  probability  area 

could  be  verified  from  the  same  quadrature  formula  used  earlier  to  com¬ 
pute  the  probability  area  for  the  single  common  factor  distribution  from 
which  the  random  observations  are  generated.  Such  a  choice  for 

however,  lessens  the  representlveness  of  the  problem.  To  evaluate  the 
Importance  sampling  approach,  results  should  be  compared  when  sampling 
proceeds  from  a  distribution  which  poorly  approximates  the  distribution 
of  interest  as  well  as  when  both  distributions  each  have  only  one  common 
factor.  To  increase  the  generality  of  the  results,  therefore,  for  each 
single  common  factor  R^,  both  a  "good"  single  factor  approximation  and 

a  "poor"  single  factor  approximation  were  constructed  for  the  sampling 
distribution. 


All  correlations  in  R^  were  chosen  to  be  .90  (both  for  four  and  eight 


variates)  for  one  set  of  problems.  Such  high  positive  correlations  will 
yield  multidimensional  probability  regions  highly  concentrated  about  the 
center  of  the  distribution.  To  work  with  probability  regions  more  evenly 
distributed  over  the  Infinite  domain,  all  correlation  elements  were  set 
to  .10.  Four-  and  eight-variate  matrices  containing  randomly  selected 
positive  correlations  with  more  than  one  common  factor  were  also  con¬ 
structed,  although  a  quadrature  procedure  for  checking  the  Monte-Carlo 
estimates  was  not  available.  For  the  two  kinds  of  single  common  factor 
correlation  matrices  used  for  sampling,  one  R  had  the  structure 

rj,j  ■  ■  (•90)(*90)  “  .81,  and  the  other  had  the  structure 

r^^j  ■  (.i0)(.l0)  “  .01  (both  for  four-  and  eight-variate  problems).  For 


the  "unknown"  integral  with  all  r  «  .90,  the  R  with  all  correlations 

^  J  8 

equal  to  .81  represented  the  "good"  Importance  sampling  approximation; 
the  R  with  all  correlations  equal  to  .01  represented  the  "poor"  approxi- 


8 


with 


mat ion.  Conversely,  when  all  r^j  were  set  to  .10  in  R^,  the  Rg 

correlations  equal  to  .81  was  the  poor  sampling  approximation  and  the  R 
with  correlations  of  .01  was  the  good  one.  ^ 
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Probability  estimates  were  also  obtained  for  3  different  kinds  of 
Integration  limits.  One  set  of  limits  Included  only  the  central  portion 
of  the  distribution,  -1.  to  +1.  for  all  four  or  eight  variates.  Another 
set  Included  the  tails,  -»  to  0.  for  all  variates.  In  a  third  set,  the 
limits  contained  a  variety  of  both  tails  and  central  portions  (-3.  to  -1.; 
-1.  to  0;  0  to  1;  and  1.  to  5.  for  four  variates  and  -3.  to  1;  -3  to  2; 

-3  to  3»;  -2  to  1.;  -2  to  2.;  -2  to  3'J  "I*  to  2.;  -1  to  3*  fot  eight 
variates) . 

In  sumnary,  the  series  of  experiments  used  to  compare  the  relative 
efficiency  of  two  Monte-Carlo  methods  for  evaluating  the  multivariate 
normal  Integral  contained  four  categories  of  Independent  variable: 

1)  number  of  variates,  4  and  8;  2)  structure  of  the  multivariate  normal 
distribution  of  Interest,  as  characterized  by  a  one  common- factor  corre¬ 
lation  matrix  with  high  positive  correlations,  a  one  common- factor  corre¬ 
lation  matrix  with  low  positive  correlations,  and  a  correlation  matrix 
with  randomly  selected  positive  correlations;  3)  goodness  of  approxima¬ 
tion  of  the  sampling  distribution  to  the  multivariate  normal  distribution 
being  Integrated;  4)  range  of  Integration,  with  limits  Involving  only  the 
central  portion  of  the  distribution  or  Including  both  tall  and  central 
portions  as  well. 

All  possible  combinations  of  the  Independent  variables  planned  for 
the  series  of  experiments  totaled  36  problems;  I8  of  these  problems  were 
based  on  four-varlate  distributions  and  I8  comparable  problems  were  for  the 
eight-variate  distributions.  For  each  of  these  problems,  10  independent 
probability  estimates  were  obtained.  The  estimates  for  the  four-varlate 
problems  were  each  based  on  an  n  of  1000  random  vector  observations; 

10,000  random  vectors  were  generated  for  each  of  the  elght-varlate  problems. 


Results  from  the  Monte-Csrlo  Experiments 

Results  are  presented  In  Tables  1,  2,  and  3>  For  each  of  the  four- 
and  elght-varlate  series,  9  different  probability  regions  were  being 
evaluated.  The  first  line  of  values  presented  for  each  such  region  Is 
based  on  the  first  and  simplest  method  of  Monte-Carlo  sampling  described 
(Method  l).  The  second  two  lines  of  values  were  obtained  using  the  Boldt 
importance  sampling  approach  (Method  II),  but  with  one  set  of  estimates 
being  obtained  from  a  "good"  sampling  distribution  approximation  and  the 
other  set  from  a  "poor"  approximation. 

For  the  single  conmon- factor  distribution,  probabilities  computed  by 
quadrature  are  presented  In  the  tables  as  TT,  to  represent  the  "population" 
value.  The  average  squared  deviation  from  this  population  value  (^|TT) 
over  the  10  estimates  per  problem  was  used  as  a  measure  of  the  accuracy 
of  a  given  metho(^.  The  average  squared  deviation  from  the  observed  mean 


^  It  Is  the  within  sample  variances  which  are  presented  In  formulas  (4), 
(8),  and  (9).  The  measures  of  between  sample  variability,  however, 
were  more  discriminatory  of  the  effectiveness  of  the  different  Monte- 
Carlo  methods  and  are  therefore  the  ones  presented  In  the  tables. 
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(^|p)  based  on  each  set  of  10  samples  was  also  computed.  It  is  the 
square  roots  of  these  measures  (s|TT  and  b|  p,  "standard  deviations") 
that  are  presented  in  Tables  1  and  2.  The  standard  deviations  obtained 
for  the  simple  Monte-Carlo  method  were  used  as  the  baselines  by  which 
relative  amount  of  variation  for  different  problems  could  be  evaluated. 
These  ratios  of  standard  deviations  are  also  presented  in  Tables  1  and  2 
for  each  set  of  problems. 

The  first  observation  which  should  be  made  about  the  results  is  an 
apparent  equivalence  of  the  two  measures  of  variability,  s|tT  and  s|p. 

That  Is,  the  degree  to  which  one  Monte-Carlo  method  is  more  precise  than 
another  for  a  given  problem  appears  to  be  independent  of  whether  the 
deviations  of  the  estimates  are  taken  about  the  observed  mean  or  about 
the  population  value.  Of  course,  such  a  generalization  can  be  made  only 
with  respect  to  unknown  Integrals  based  on  single  common  factor  distri¬ 
butions.  Equivalence  of  the  measures,  if  equivalent  over  all  types  of 
problem,  could  be  taken  as  an  indication  that  the  Monte-Carlo  estimates, 
even  though  highly  variable,  are  unbiased  estimates  and  can  be  expected 
under  increased  sampling  to  converge  to  the  true  population  value. 

To  further  clarify  the  form  of  the  results,  the  ratios  of  standard 
deviations  were  extracted  from  Tables  1  and  2  and  regrouped  to  form 
Table  3*  Noting  that  a  ratio  of  standard  deviations  above  1.0  indicates 
superiority  of  the  Importance  sampling  method,  whereas  a  ratio  below  1.0 
favors  the  simpler  sampling  procedure,  a  striking  characteristic  of 
Table  3  Is  that  neither  method  is  consistently  superior  over  the  differ¬ 
ent  types  of  Integral  being  evaluated.  The  ideal  result,  of  course,  is 
to  find  some  method  which  yields  estimates  with  a  marked  reduction  of 
variance  on  all  problems.  With  the  results  shown  here,  the  types  of 
problem  for  which  one  method  might  be  superior  to  another  is  a  matter 
only  for  hypothesizing.  Ultimately,  perhaps,  the  most  economical  pro¬ 
cedure  with  respect  to  computer  time  will  be  the  design  of  some  test  in 
which  the  type  of  probability  integral  to  be  evaluated  is  examined  by 
the  computer  before  it  proceeds  to  analysis  by  one  of  several  alternate 
methods.  At  most,  these  experimental  results  may  have  bearing  on  some 
entirely  new  approach  to  the  problem  of  evaluating  multivariate  normal 
Integrals. 

An  independent  research  project  is  being  conducted  by  Cecil  Johnson 
(7)  at  BESRL  on  increasing  the  goodness  of  fit  of  the  single  common-factor 
sampling  distribution  to  the  multivariate  normal  distribution  of  Interest 
to  increase  the  precision  of  importance  sampling.  Preliminary  results 
of  this  research  indicate  that  using  Improved  methods  to  determine  the 
parameters  from  which  the  common  factor  random  entitles  are  generated 
can  result  in  an  appreciable  reduction  in  variance.  That  such  an  approach 
is  a  fruitful  one  is  supported  by  the  data  presented  in  the  present  pub¬ 
lication.  In  fact,  the  one  striking  observation  which  comes  from  Tables 
1,  2,  and  3  is  the  superiority  of  the  "good"  approximation  sampling  dis¬ 
tribution  to  the  "poor"  approximations  in  yielding  minimum  variance 
estimates.  This  advantage  shows  up  in  all  24  of  the  problems  where 
goodness  of  fit  was  compared  and  goes  as  high  as  42  to  1  in  problem  22 
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with  eight  variates.  One  should  be  aware,  of  course,  of  the  fact  that, 
when  the  sampling  approximation  is  very  good  (i.e.,  when  the  sampling 
distribution  is  nearly  identical  to  the  distribution  associated  with  the 
unknown  integral),  the  variable  components  on  which  an  importance  sampling 
estimate  is  based  are  very  small  relative  to  the  scaling  constant  com¬ 
puted  by  quadrature.  Therefore,  the  superiority  shown  in  problems  1,  4, 

7,  10,  15,  16,  22,  28,  and  34  is  partly  a  consequence  of  the  fact  that 
the  unknown  distribution  is  one  of  the  rare  distributions  whose  Integral 
value  is  near  to  the  value  computed  by  the  given  quadrature  formula,  a 
situation  not  to  be  encountered  often.  The  real  test  of  an  importance 
sampling  approach  is  whether  an  advantage  can  be  observed  when  the 
sampling  approximation  is  only  moderately  good  or  poor.  Such  an  advantage 
does  show  up  in  problems  3  ^nd  21  (  for  four  and  eight  variates)  and  9 
(for  four  variates),  both  poor  approximation  problems,  but  not  in  other 
poor  approximation  problems,  2,  8,  13,  20,  26,  and  35,  (for  four  and 
eight  variates)  and  27  and  32  (for  eight  variates). 

The  determination  of  a  suitable  single  common-factor  distribution 
and  the  manner  in  which  random  entitles  are  generated  from  this  distri¬ 
bution  is  not  a  straight-forward  procedure.  By  varying  such  a  procedure, 
precision  of  the  estimates  based  on  even  a  poor  sampling  distribution 
can  be  Improved.  The  results  presented  here  may  be  considered  relatively 
crude  with  respect  to  this  aspect  of  the  problem;  Improvement  is  expected 
when  the  procedures  under  investigation  by  Johnson  are  Incorporated  into 
the  techniques  described  here. 

The  estimates  in  Table  3  do  appear  to  favor  slightly  the  importance 
sampling  approach  when  the  region  of  integration  is  over  the  center  of 
the  distribution.  The  best  results,  in  problems  4  and  22  for  four  and 
eight  variates,  were  obtained  when  the  range  of  integration  was  from  -1. 
to  +1.,  the  center,  even  though  the  associated  multivariate  normal  dis¬ 
tribution  was  least  concentrated  about  the  center  (i.e.,  all  r's  >  .10). 
Furthermore,  the  ratios  of  standard  deviations  were  greater  than  one  in 
all  but  one  of  the  problems  designed  for  the  central  region,  regardless 
of  the  type  of  distribution  being  Integrated.  The  importance  sampling 
method  tends  to  be  relatively  less  efficient  when  tails  are  included  in 
the  integration  limits.  For  example,  problems  23  and  24  for  eight 
variates  Indicate  a  gross  failure  of  the  Imporatnce  sampling  method, 
although  these  problems  are  also  based  on  poor  approximations  to  eight 
variate  conmon  factor  distributions.  Furthermore,  any  observation  on 
tail  results  cannot  be  stated  too  conclusively,  since  the  tail  regions 
for  the  data  under  discussion  were  not  examined  to  the  exclusion  of  any 
of  the  more  central  regions. 

Clearly,  the  relative  advantage  of  one  sampling  method  over  another 
is  highly  sensitive  to  variations  in  the  region  of  integration  relative 
to  the  structure  of  ;:he  correlation  matrix.  For  instance,  the  ratios  for 
the  four-variate  distribution  characterized  by  a  single  common- factor 
correlation  matrix  with  large  positive  elements  (problems  1,  7,  IjJ,  19, 

23,  and  31)  Increased  by  a  multiple  of  about  2  when  the  Integration  limits 
included  a  variety  of  ranges,  even  though  the  probability  area  was  small. 
By  contrast,  when  the  correlation  matrix  had  very  small  positive  elements, 
the  advantage  of  the  importance  sampling  method  decreased  to  about  one- 
half  when  the  tails  were  included  (problems  4,  10,  16,  22,  28,  and  54)* 
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Table  1 


MONTE-CARLO  ESTIMATES  BASED  ON  TWO  DIFFERENT  SAMPLING  PROCEDURES  FOR 
EVALUATING  THE  MULTIVARIATE  NORMAL  INTEGRAL 

FOUR-VARIATE  PROBLEMS 

(N  -  1000) 


All  correlations  ■  .90;  Centered  limits:  -1  to  +1 

s  s 


Problem 

Method 

TT 

P 

s|  p 

ratio 

s(TT 

ratio 

1 

f 

I 

.5114 

.5121 

.02175 

.02177 

1 

II  r  -.81 

g 

Good 

.5114 

.5125 

.00871 

2.50 

.00877 

2.48 

2 

II  r  -.01 
g 

Poor 

.5114 

.5125 

.02856 

0.77 

.02838 

0.77 

All  correlations  -  . 

10:  Centered  limits:  -1 

to  +1 

Problem 

Method 

TT 

P 

s|p 

a 

ratio 

s|TT 

s 

ratio 

5 

I 

.2202 

.2205 

.00331 

.01331 

3 

II  r  -.81 

g 

Poor 

.2202 

.2156 

.01092 

1.22 

.01  ^ 

1.12 

4 

II  r  -.01 
g 

Good 

.2202 

.2204 

.00057 

55.56 

.00040 

33.46 

Ful 1  Rank,  ; 

positive  correlations; 

Centered 

limits : 

-1  to  +1 

Problem 

Method 

P 

•|P 

s 

ratio 

5 

I 

.3217 

.00212 

5 

II  r  -.81 

g 

.3187 

.01645 

1.29 

6 

II  r  -.01 
g 

.3226 

.00325 

6.56 

All  c.rrelations 

-  .90 

;  limits 

include 

tails: 

-•  to  0 

Problem 

Method 

TT 

P 

»|P 

s 

ratio 

s|TT 

s 

ratio 

7 

I 

.5695 

.5618 

.01639 

.01803 

7 

II  r  -.81 

g 

Good 

.5693 

.3705 

.00688 

2.58 

.00698 

2.58 

8 

II  r  -.01 
g 

Poor 

.3693 

.5296 

.05068 

.53 

.05018 

.36 

-  IT  - 
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Table  1  (continued) 


All  correlations  *  .10; 

limits 

Include 

tails :  to  0 

rroblem 

Method 

TT 

P 

s|  p 

s 

ratio  s|tT 

s 

ratio 

9 

I 

.0871 

.0869 

.01065 

.01066 

9 

II  r  ■.81  Poor 

g 

.0871 

.0843 

.00965 

1.11  .01003 

1.05 

10 

II  r  “.01  Good 
g 

.0871 

.0871 

.00075 

14.23  .00075 

14.22 

Full  rank,  positive 

correlations;  limits  include  tails:  -®  to 

0 

Problem 

Method 

P 

s|  p 

s 

ratio 

11 

I 

.1951 

.01504 

11 

II  r  -.81 

g 

.1954 

.01783 

.75 

12 

II  r  -.01 
g 

.1919 

.00892 

1 .46 

All  correlations 

-  .90; 

limits  are  varied 

Problem 

Method 

n 

P 

s|  p 

s 

ratio  s|  n 

s 

ratio 

13 

I 

.07054 

.07410 

.00780 

.00858 

15 

II  r  -.81  Good 

g 

.07054 

.07065 

.00145 

0 

0 

• 

CO 

• 

5.90 

14 

II  r  -.01  Poor 
g 

.07054 

.06926 

.00725 

1.08  .00754 

1.17 

All  correlations 

-  .10; 

limits  are  varied 

Problem 

Method 

n 

P 

s|  p 

S 

ratio  s|n 

s 

ratio 

15 

I 

.1555 

.1544 

.01547 

.01550 

15 

II  r  -.81  Poor 

g 

.1535 

.1465 

.02461 

.65  .02562 

.60 

16 

o 

II  r  -.01  Good 
g 

.1555 

.1554 

.00042 

36.74  .00043 

55.93 

Full  rank,  positive  correlations;  limits  are  varied 

Problem 

Method 

P 

s|  p 

s 

ratio 

17 

I 

.1188 

.01260 

17 

II  r  -.81 

g 

.1507 

.04198 

.50 

18 

II  r  -.01 
g 

.1148 

.00520 

5.94 

-  18  - 


Table  2 


MONTE- CARLO  ESTIMATES  BASED  ON  TWO  DIFFERENT  SAMPLING  PROCEDURES  FOR 
EVALUATING  THE  MULTIVARIATE  NORMAL  INTEGRAL 

EIGHT- VARIATE  PROBLEMS 

(N  *  10,000) 


All  correlations  *  .90 ;  centered  limits:  -1  to  +1 


Problem 

Method 

n 

p 

s|  p 

s 

ratio 

s|n 

s 

ratio 

19 

I 

.4302 

.4309 

.00335 

.00345 

19 

II  r  -.81 

g 

Good 

.4302 

.4304 

.00327 

1.05 

.00527 

1.05 

20 

II  r  -.01 
g 

Poor 

.4302 

.4245 

.01582 

.21 

.01682 

.20 

All  correlations  -  .10;  centered  limits:  -1 

to  +1 

Problem 

Method 

n 

p 

s|  p 

s 

ratio 

s|n 

s 

ratio 

21 

I 

.0499 

.0502 

.00229 

.00252 

21 

II  r  -.81 

g 

Poor 

.0499 

.0302 

.00214 

1.07 

.00216 

1.08 

22 

II  r  -.01 
g 

Good 

.0499 

.0499 

.00005 

45.02 

.00006 

38.87 

Full  rank,  ] 

positive  correlations; 

centered 

limits : 

-1  to  +1 

Problem 

Method 

P 

s|  p 

s 

ratio 

25 

I 

.1757 

.00378 

23 

II  r  -.81 

g 

.1731 

.00291 

1.30 

24 

II  r  -.01 
g 

.1751 

.00238 

1.59 

All  correlations 

-  .90; 

limits 

Include 

tails : 

-00  to  0 

Problem 

Method 

n 

P 

s|  p 

s 

ratio 

s|n 

s 

ratio 

25 

I 

.3211 

.3206 

.00412 

.00415 

25 

II  r  -.81 

g 

Good 

.3211 

.3212 

.00280 

1.47 

.00280 

1.48 

26 

II  r  -.01 
g 

Poor 

.3211 

.2590 

.02546 

.17 

.06640 

.06 

19 


Table  2  (continued) 


All  correlations  ■  .10; 

limits 

include 

tails: 

to  0 

Problem 

Method 

n 

P 

■1  P 

s 

ratio 

•In 

■ 

ratio 

27 

I 

.0141 

.0141 

.00095 

.00095 

27 

II  r  -.81  Poor  .0141 

g 

.0126 

.00136 

.70 

.00205 

.47 

28 

II  r  -.01  Good  .0141 

g 

.0141 

.00010 

9.05 

.00011 

8.64 

Full  rank^  positive  correlations;  limits  include  tails 

:  -•  to 

0 

Problem 

Method 

P 

s|  p 

8 

ratio 

29 

I 

.1549 

.00557 

29 

II  r  -.81 

g 

.1249 

.00775 

.44 

30 

II  r  -.01 
g 

.1206 

.00697 

.48 

All 

correlations 

-  .90: 

limits  are  varied 

Problem 

Method 

n 

P 

s|  p 

s 

ratio 

sjn 

8 

ratio 

51 

I 

.5962 

.5960 

.00564 

.00564 

51 

II  r  -.81 

g 

.5962 

•5966 

.00552 

1.02 

.00555 

1.02 

32 

II  r  -.01 
g 

.5962 

.5895 

.08477 

.07 

.08505 

.07 

All 

correlations 

-  .10; 

limits  are  varied 

Problem 

Method 

n 

p 

s|  p 

a 

ratio 

s|n 

8 

ratio 

35 

I 

.4359 

.4595 

.00579 

.00510 

33 

II  r  -.81 

g 

.4359 

.5128 

.05915 

.10 

.12919 

.04 

54 

II  r  -.01 
g 

.45^ 

.4565 

.00144 

2.64 

.00148 

5.45 

Full  rank, 

positive  correlations:  limits  are  varied 

Problem 

Method 

P 

s|  p 

8 

ratio 

35 

I 

.5254 

.00488 

35 

II  r  -.81 

.4900 

.09584 

.05 

36 

II  r  -.01 
g 

.5191 

.02511 

.19 
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Table  3 


STANDARD  DE\riATION  RATIOS:  VARIABILITY  OF  IMPORTANCE  SAMPLING 
PROBABILITY  ESTIMATES  REUTIVE  TO  METHOD  I 
MONTE -CARLO  ESTIMATES 


Correlac ion 
Structure 

Varied  Integratiou  Limits 

|r,) 

Problem 

No. 

S.D. 

Ratio 

Problem 

No. 

S.D. 

Ratio 

Problem 

No. 

S.D. 

Ratio 

1 

Fc 

lur  Variates 

I'N  -  1C 

)00) 

.')0 

.^'1,  good 

1 

2.30 

7 

2.38 

15 

5.38 

.90 

.01 ,  good 

2 

.77 

e 

.55 

14 

1.08 

.10 

.^^<1,  poor 

5 

1.22 

9 

1.11 

15 

.63 

.1C 

.01,  good 

4 

59.36 

10 

14.23 

16 

36.74 

Random 

.Pi 

5 

1.2'? 

11 

.75 

17 

.50 

Random 

.01 

6 

6.56 

12 

1.46 

18 

3.94 

Eight  Variates 

(N  -  10,000) 

.^0 

.•'1,  good 

19 

1.03 

25 

1.47 

1  51 

1.02 

,10 

.01,  poor 

20 

.21 

26 

.17 

32 

.07 

.10 

.Pi,  poor 

21 

1.07 

27 

.70 

35 

.10 

.10 

.01,  good 

22 

45.02 

28 

9.03 

34 

2.64 

Random 

.81 

25 

1.30 

29 

.44 

35 

.05 

Sscond  S«riet  of  MonlH-Corlo  FK^nriiniKilii 


An  addit ionnl  hpi  Iim:  of  I'xporimentH,  similar  to  the  previous  series^ 
was  designed  to  examine  the  prei-tulon  of  Monte-Carlo  estimates  for  very 
small  probability  rcKionH.  A  variety  of  single-factor  problems,  for  which 
IT  could  be  easily  approximated  by  quadrature,  was  evaluated  on  the  com¬ 
puter  until  a  set  of  probability  regions  similar  in  magnitude,  but  small, 
were  found.  The  largest  of  these  regions  was  .0296^  and  the  smallest  was 
.00007 •  One  set  of  experiments  was  designed  for  four  variates  and  another 
for  eight  variates.  The  correlation  matrices  characterizing  the  distri¬ 
butions  to  be  integrated  were  the  same  single  common-factor  matrices  used 
for  the  previous  problems;  all  correlation  coefficients  in  these  matrices 
were  equal  either  to  .90  or  to  .10. 

In  the  second  experimental  series,  results  were  highly  encouraging 
in  favor  of  the  importance  sampling  approach.  With  small  TT,  the  ratios 
of  standard  deviations  measuring  precision  of  Monte-Carlo  Method  II, 
relative  to  Method  I,  were  quite  large,  ranging  from  7  to  1  to  as  high 
as  2989  to  1.  The  best  result  for  the  problems  previously  discussed  was 
only  45  to  1  (problem  22,  Table  5).  Furthermore,  the  direction  of  the 
results  was  consistent.  For  every  problem.  Method  II  was  more  precise 
than  Method  I.  The  two  variance  measures,  with  deviations  taken  about 
the  observed  mean  in  one  measure  and  about  the  population  parameter  in 
the  other,  gave  comparable  results,  as  they  did  in  the  first  series  of 
problems.  This  conformity  was  again  taken  to  be  an  indication  of  lack 
of  bias  in  the  results. 

Judging  by  an  inspection  of  the  results  of  Tables  4  and  the 
advantage  favoring  the  importance  sampling  method  appears  to  be  related 
only  to  the  size  of  the  probability  region.  Central  regions  versus  tall 
regions,  or  four  versus  eight  variates,  do  not  appear  to  distinguish  one 
method  from  another.  In  general,  better  results  were  obtained  with 
Method  II  when  the  sampling  distribution  approximation  was  good,  although 
exceptions  occurred  in  problems  42,  44,  and  46. 

The  point  where  the  advantage  due  to  size  of  TT  disappears  is,  of 
course,  not  entirely  clear.  The  very  large  ratios  of  Tables  4  and  ^  are 
for  TT's  of  .003  or  less.  Note  that  in  problem  47,  when  TT  is  as  large  as 
.030,  the  ratio  drops  to  only  7  to  1.  The  ambiguous  results  presented 
in  Tables  1  and  2,  with  Method  I  being  superior  in  some  problems  and 
Method  II  being  superior  in  others,  usually  involved  very  large  TT's  (e.g., 
.5114  or  .4302).  But  when  TT  was  small  (  .0499  or  .0705),  as  in  problems 
13,  14,  21,  and  22,  results  favored  the  importance  sampling  approach. 

On  the  other  hand,  TT  for  problem  27  was  small,  .0141,  and  the  relative 
sizes  of  the  variances  indicate  greater  precision  in  the  brute-force 
method.  Very  likely,  some  interaction  is  occurring  between  size  of  i.ne 
probability  region  and  adequacy  of  the  sampling  distribution  approxima¬ 
tion.  Results  for  Method  II  tended  to  be  poor  when  integration  was  over 
the  tails  of  a  relatively  flat  multivariate  normal  distribution  (such  as 
when  all  r's  equaled  .10),  and  sampling  was  from  a  distribution  with  most 
of  its  mass  concentrated  over  the  center  (as  when  all  r's  *  .90);  observe 
problems  9  and  27. 
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Table  4 


CCMPARISON  OF  MONTE-CARLO  ESTIMATES  FOR  SMALL  PROBABILITY  REGIONS  IN  EITHER 
THE  TAILS  OR  THE  CENTER  OF  THE  MULTIVARIATE  NORMAL  DISTRIBUTION 

FOUR-VARIATE  PROBLEMS 


(N  “ 

1000) 

All  correlations 

0 

N 

Central  limits: 

0  to  . 

.50 

Problem 

Method 

n 

p 

sjp 

S 

ratio 

s|n 

s 

ratio 

37 

I 

.00298 

.00370 

1.7349x10"^ 

1.8776x10"^ 

57 

II  r  ".81  Good 

8 

.00298 

.00296 

2.7515x10"® 

650.6 

1 .7728x10"^ 

105.9 

38 

II  r  “.Ol  Poor 

8 

.00298 

.00296 

6.2519x10"® 

277.5 

1 .9058x10"^ 

98.5 

All  correlations 

“  .10; 

Central  limits: 

0  to  . 

50 

Probl  em 

Method 

n 

p 

s|p 

s 

ratio 

s|n 

s 

ratio 

39 

I 

.00020 

.00000 

0 

2.0100x10*^ 

39 

II  r  “.81  Poor 
g 

.00020 

.00020 

1 .8481x10"'^ 

0 

1.5733x10"® 

127.8 

40 

II  r  “.01  Good 

8 

.00020 

.00020 

5.8555x10"® 

0 

1.6129x10"® 

124.6 

All  correlations 

0 

1 

Tall  limits:  1 

.9  to  2. 

5 

Problem 

Method 

n 

p 

s|p 

s 

ratio 

s|n 

8 

ratio 

41 

I 

.00266 

.00100 

1 .0954x10"^ 

1 .9855x10"^ 

41 

II  r  “.81  Good 

g 

.00266 

.00267 

8.9168x10"® 

122.8 

1 .5949x10"^ 

142.3 

42 

II  r  “.01  Poor 

8 

.00266 

.00265 

5.9575x10"5 

18.4 

5.9927x10’^ 

53.1 

All  correlations 

“  .10; 

Tall  limits:  1 

.5  to  2. 

5 

Problem 

Method 

n 

P 

sIp 

s 

ratio 

s|n 

S 

ratio 

45 

I 

.00007 

.00010 

5.0000x10"'^ 

3.0170x10"'* 

43 

II  r  “.81  Poor 

8 

.00007 

.00007 

8. 9698x10 

554.4 

1.5096x10"® 

199.9 

44 

II  r  “.01  Good 

8 

.00007 

.00007 

5.7936x10"^ 

517.8 

7.1184x10"^ 

423.8 

25  - 
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Table  5 


COMPARISON  OF  MONTE-CARLO  ESTIMATES  FOR  SMALL  PROBABILITY  REGIONS  IN  EITHER 
THE  TAILS  OR  THE  CENTER  OF  THE  MULTIVARIATE  NORMAL  DISTRIBUTION 

EIGHT-VARIATE  PROBLEMS 
(N  -  10,000) 


All  correlations 

-  .90; 

Central  limits: 

0  to 

.55 

Problem 

Method 

n 

p 

s|  p 

s 

ratio 

s|n 

s 

ratio 

45 

I 

.00012 

.00015 

1.1874x10”^ 

1.1945x10"* 

45 

II  r  - 

g 

.81 

Good 

.00012 

.00012 

7.5757x10’'^ 

1567.8 

8.2050x10"^ 

145.6 

46 

O 

II  r  - 
g 

.01 

Poor 

.00012 

.00012 

1.4855x10“^ 

799-5 

8.0974x10""^ 

147.5 

All  correlations 

-  .10; 

Central  limits: 

0  to 

.85 

Problem 

Method 

n 

P 

s|  p 

s 

ratio 

s|n 

8 

ratio 

47 

I 

.02965 

.02980 

1.6528x10"^ 

1.6598x10"^ 

47 

II  r  - 
g 

II  r  - 
g 

.81 

Good 

.02965 

.02965 

1.0044x10"'* 

16.3 

1.0295x10"* 

15.9 

48 

.01 

Poor 

.02965 

.02959 

2.1859x10"'* 

7.5 

2.2927x10"* 

7.2 

All  correlations 

-  .10; 

Central  limits: 

0  to 

.85 

Problem 

Method 

n 

P 

s|  p 

s 

ratio 

s|n 

s 

ratio 

49 

I 

.00010 

.00015 

g.ooooxio"-^ 

9.4868x10"^ 

49 

II  r  - 

.81 

Poor 

.00010 

.00010 

5.6664xlo"^ 

245.6 

4.6482x10"^ 

204.1 

50 

g 

II  r  - 
g 

.01 

Good 

.00010 

.00010 

5.4429x10"® 

2614.1 

1.9871x10"^ 

477.4 

All  correlations 

-  .90; 

Tall  limits;  1 

.8  to  2.5 

Problem 

Method 

n 

p 

s|  p 

s 

ratio 

s|n 

s 

ratio 

51 

I 

.00095 

.00088 

2.8214x10"^ 

2.9145x10"* 

51 

II  r  - 

.81 

Good 

.00095 

.00095 

2.4541x10"^ 

115.9 

2.5709x10"® 

115.4 

52 

g 

II  r  - 
g 

.01 

Poor 

.00095 

.00096 

9.7721x10"® 

28.9 

1.0l21xl0"^ 

28.8 

All  correlations 

-  .10; 

Tall  limits:  0.8  to  2.5 

Problem 

Method 

n 

p 

s|  p 

s 

ratio 

s|n 

s 

ratio 

55 

I 

.00008 

.00009 

8.5066x10"^ 

8.4077x10"^ 

55 

II  r 

g 

.81 

Poor 

.00008 

.00008 

1.5748x10"® 

52.7 

1.5968x10"® 

52.6 

54 

II  r 

g 

.01 

Good 

.00008 

.00008 

5.2592x10""^ 

256.4 

5.5284x10"^ 

258.5 

24 


One  implication  of  the  results  obtained  for  small  TT's  is  that  the 
recommended  importance  sampling  approach  could  be  improved  by  introducing 
sample  stratification.  The  probability  region  A  would  be  divided  into 
mutually  exclusive  regions  A^i  •••,  A^,  and  the  final  estimate  p 

would  be  the  sum  of  t  independent  estimates  Pj^,  p^;  i.e., 

A  ^ 

TT  ■  p  >  Z  p  .  Some  properties  of  sample  stratification  were  presented 
8*1  * 

under  the  section  describing  Importance  sampling  based  on  a  uniform 
sampling  distribution.  The  advantage  stated  there  was  that,  when  the 
number  of  points  sampled  in  each  region  is  proportional  to  the  magnitude 
of  the  region,  the  within  sample  variance  based  on  stratification  will 
be  less  than  or  equal  to  the  variance  for  an  unstratified  sample.  The 
importance  sampling  method  used  in  the  experiments  should  have  an  addi¬ 
tional  advantage  because  the  probability  regions  will  be  small. 


SUMMARY 

Experimeaial  results  have  been  presented  to  evaluate  two  different 
methods  of  Monte-Carlo  sampling  to  integrate  the  multivariate  normal  dis¬ 
tribution.  Using  random  vector  observations  generated  to  have  the  distri¬ 
bution  of  Interest,  one  method  is  basically  a  count  of  the  observations 
which  lie  within  the  region  of  integration.  A  more  complex  method  is  an 
adaptation  of  importance  sampling  in  which  a  single  comnon- factor  multi¬ 
variate  normal  distribution  is  the  sampling  distribution.  Random  vector 
entities  are  constrained  in  such  a  way  that  only  observations  which  lie 
within  the  specific  integration  limits  are  generated. 

The  precision  of  the  estimates  for  the  two  methods  was  compared  from 
the  magnitude  of  the  variances  of  the  estimates  over  independent  samples. 
The  form  of  the  results  appears  to  be  affected  both  by  the  size  of  the 
probability  region  over  which  integration  is  performed  and  also  by  the 
goodness  of  fit  of  the  importance  sampling  distribution  to  the  distribu¬ 
tion  under  evaluation.  When  the  probability  region  is  very  small,  the 
importance  sampling  approach  is  clearly  the  more  efficient  method.  For 
larger  probability  regions,  the  importance  sampling  approach  is  superior 
to  the  tallied  method  when  the  sampling  approximation  is  a  good  approxi¬ 
mation.  When  the  importance  sampling  approximation  is  poor,  the  precision 
of  the  probability  estimate  favors  the  simpler  Monte-Carlo  procedure.  The 
results  indicate  that  applying  the  importance  sampling  technique  to  a 
probability  region  divided  into  strata,  a  very  small  probability  being 
associated  with  each  stratum,  will  yield  precise  estimates  for  a  wide 
variety  of  problems.  Sampling  distributions  other  than  the  multivariate 
normal  might  also  be  superior  for  selected  portions  of  the  probability 
region.  Finally,  BESRL  research  scientists  are  continuing  studies  to 
improve  the  adequacy  of  the  approximation  of  the  one  coiranon- factor  dis¬ 
tribution  to  a  multivariate  normal  distribution  of  interest.  Information 
gained  from  the  new  research  may  be  expected  to  increase  the  efficiency 
of  the  techniques  described  here. 
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